10 - Artificial Intelligence II [ID:52582]
50 von 624 angezeigt

Okay, there we are. So the current topic are Markov models in various ways, Markov processes,

hidden Markov models and so on and so forth. We talked about the four primary tasks of

inference that we have in these kinds of models, namely filtering, i.e. figuring out the current

state of the world given the evidence so far, prediction estimating the future state of the

world given available evidence so far, smoothing, figuring out the most likely state of the world

at some point in the past given all the evidence so far and finally most likely explanation,

i.e. figuring out the most likely sequence of states the world has been in given all evidence

so far. In general, we assume that we have a bunch of state variables X and a bunch of

evidence variables E that jointly form a Markov chain, i.e. we have the first order Markov property,

we have the sensor Markov property, i.e. we assume that all of the variables E at every time slice T

only depend on the state variables at that particular point in time and in the special case where

we have a single state variable X and a single evidence variable E, we have a hidden Markov

model and then if we assume stationarity as well, i.e. that the transition model is the same at

every time slice, then we can use the matrix forms to represent all of the equations for these

kinds of tasks. In general, our goal, especially for filtering, is to find a way to compute the

distribution of the current state given the evidence so far in a recursive manner such that

we can basically at every time step update our world model and then iterate over all of the time

slices, i.e. we want a recursive function F that just takes the latest percept ET and the previous

distribution at the previous time step. Then in matrix form for a stationary hidden Markov model,

this is going to look like this, i.e. we basically just have a matrix product of the previous

distribution represented as a vector with the transition matrix transposed and the

observational matrix at that particular time step. Here's the derivation of the whole thing

and that gives us exactly such a recursive function that we want. In a stationary hidden

Markov model, we get this matrix representation. Here it is again, so observational matrix,

transposed transition matrix, recursive call of the previous distribution at the previous time step.

I think it makes sense to consider what happens if we drop some of the properties of a stationary

hidden Markov model. For example, if we look at this equation here or at this formula here,

what would we have to change about this if we assume that the hidden Markov model is not stationary?

What do we need to change then? Yes? Yes, in what way would we need to change the transition matrix?

Right, it wouldn't be the same for every t, so we would have to add an index t here basically.

So we would have a different transition matrix at every single inference step. Okay, so this is for

a non-stationary HMM, then we would have a time index t. What happens if we don't have a hidden

Markov model? I.e. we have more state variables and more evidence variables.

Sorry? Sorry? T based on some index, that's what we already get if we just drop stationarity.

Or I misunderstood. Yes? That would be one way, yes. We could introduce a transition

matrix and an evidence matrix for every element of the domain of the, sorry, for every single one

of those x t and for every single one of those o t, which basically just means we just stop using

matrices at all and just use this formula, the more general one here. Yeah?

Exactly, so the x t minus one would have to have all of the, sorry, can you repeat that? I think I misunderstood.

Ah, in here we would instead of x t minus one, we would have from one to x t minus one. Not quite, that is what we would get if we drop the Markov property.

Right? So if we assume the first order Markov property, we only need to care about this particular state.

If we drop the Markov property altogether, we would have to add all of the x from one to t minus one on the right side of that conditional probability.

We're still interested in the particular state at time t, but then we exploit this in this derivation. Where do we exploit it?

Hello? Oh, basically in the first line already. No. Here. Yeah, here in the one, two, three, fourth row where we only marginalize over the previous time step.

Right? So here we do marginalization over x t minus one.

And if we drop the Markov property, we would have to add all of the x from one to t minus one in here, which gives us a giant sum over all previous time steps.

So that would be if we drop the Markov property. If we introduce additional state variables and additional evidence variables, we would basically just have to like,

add all of them in here and in here. So instead of just having a single e and a single x, we would have a sequence of those variables.

By abuse of notation, that's basically what's already here. If you just think of every x t or e t or x t minus one as a sequence of variables,

then this formula does not change at all. It's just that we can't do that.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:20:40 Min

Aufnahmedatum

2024-05-23

Hochgeladen am

2024-05-23 19:39:02

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen